Using Quarto with Python

Lucas A. Meyer

2022-07-07

Why Quarto

The content value chain

Content stuck in my computer is nearly worthless.

Goal

  • Move good content out of my computer as fast as possible
  • Reproducible
  • Git-based collaboration
  • Write once, generate:
    • Code
    • Paper
    • PowerPoint
    • Site/documentation

Literate Programming

Literate programming can help create high-quality reproducible, documented, code.

Donald E. Knuth proposed literate programming in a 1984 article.

Jupyter implements the literate programming paradigm, but generating high-quality output requires additional tools.

I tried many tools for Literate Programming

LaTeX

  • Great for PDFs… Beamer for slides
  • Not great for websites
  • Dynamic content: coding Latex
    • \usepackage{ifthen}
    • @for, @while

Word/PPT

  • Hard to collaborate before O365
  • Hard to reproduce / auto-generate

Jekyll / Github Pages

  • Great for sites
  • Not great for slides, papers

Python notebooks

  • Great, with Pandoc

RMarkdown

  • Great for everything
  • Heavily dependent on R
  • New features in Quarto

Quarto

Quarto® is an open-source scientific and technical publishing system built on Pandoc.

The name comes from the format of a book or pamphlet printed with eight pages of text, four to a side, then folded twice to produce four leaves.

The earliest known European printed book, the Sibyllenbuch (Gutemberg, c.1452), was done in the quarto format, as were many of Shakespeare’s plays.

With Quarto, you can:

  • Write Markdown and Python (or R, Julia, OJS)
  • Use equations, diagrams, citations, figures, etc.
  • Output articles, presentations, interactive websites…
  • Work with Jupyter Lab or VSCode

You can install Quarto on Linux, Windows and Mac.

Quarto workflow

The content pipeline for .ipynb

About 75% of data scientists use Python through Jupyter notebooks.

With some scripting, you can use pandoc to generate papers, HTML, PowerPoint, etc.

You just need to learn Pandoc.

graph LR
    A[.ipynb] --> B(("Pandoc"))
    B ----> E[.doc]
    B ----> H[.pptx]
    B --> C[.md]
    B --> D[.tex] 
    D --> F((Xetex))
    C --> I((Hugo))
    F --> G[.pdf]
    I --> J[.html]
    style B fill:#FF6655AA
    style F fill:#88ffFF
    style I fill:#88ffFF

Quarto in Python, in a nutshell

All you need to use Quarto is to add some YAML (mostly Pandoc configurations) to your .ipynb files: that’s a .qmd file. This keeps the configuration and content in the same file.

You can then render the outputs using quarto render <file.qmd> in the command line.

graph LR
Q[.qmd] --> A
subgraph Quarto
    A[.ipynb] --> B(("Pandoc"))
    B --> C[.md]
    B --> D[.tex] 
    D --> F((Xetex))
    C --> I((Hugo))
    style B fill:#FF6655AA
    style F fill:#88ffFF
    style I fill:#88ffFF
end
    B ----> E[.doc]
    B ----> H[.pptx]
    F --> G[.pdf]
    I --> J[.html]

Using Quarto

The YAML front-matter

Quarto files always start with a YAML front-matter.

The YAML configuration determines what’s the output format of your document. A few popular output options are html, pptx, docx, and pdf.

You can use a single source file to generate multiple output types.

For example, the YAML on the right will generate a PowerPoint file and a Revealjs presentation.

---
title: "Quarto with Python"
format: 
  pptx:
    reference-doc: templates/template.pptx
  revealjs:
    incremental: false
    theme: pulse

author: Lucas A. Meyer
date: 2022-07-07
---

Main content

Writing content

Write content in Markdown.

Quarto’s Markdown supports figures, tables, bibliography, etc.

It also supports lots of extra features, like diagrams with mermaid and GraphViz, and even LaTeX equations:

\[ E = mc^2 \]

### Writing content

Write content in [Markdown].

Quarto's Markdown supports
figures, tables, bibliography, etc.       

It also supports lots of extra features, 
like diagrams with `mermaid` and 
`GraphViz`, and even LaTeX equations: 

$$
E = mc^2
$$

What if I want to add code?

The best thing about Quarto is that you can use it to run any code that you would be able to run in a Python notebook.


import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(subplot_kw=\
                {'projection': 'polar'})
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()

Diagrams

You can use mermaid to create diagrams.

The diagram in this and in previous sections were created with mermaid.

flowchart TD

A[Hard] -->|Text| B(Round)
B --> C{Decision}
C -->|One| D[Result 1]
C -->|Two| E[Result 2]

flowchart TD

A[Hard] -->|Text| B(Round)
B --> C{Decision}
C -->|One| D[Result 1]
C -->|Two| E[Result 2]

Regression and results

Here’s an example of the first regression in Wooldridge’s Introductory Econometrics book:

\[ \text{wage} = \alpha + \beta_1 \times \text{educ} + \epsilon \]

# Load the data
df_wage = pd.read_csv("data/wage1.csv")

# Create an OLS model using 
# the R syntax - assumes an intercept
mod = smf.ols(formula="wage ~ educ",
              data=df_wage)

# Fit the model
res = mod.fit()

# Show the results
display(Markdown(md(res.summary().
        tables[1].as_html())))
coef s.e. t p-value
-0.9049 0.685 -1.321 0.187
0.5414 0.053 10.167 0

Presentations in Quarto

Basic slide syntax

To create slides, you create sections with #, titles with ##, and bullets with -.

Content types

  • You can add several types of content
    • code (use backticks)
    • images
    • diagrams
    • tables
    • etc.
## Basic slide syntax

To create slides, you create sections 
with `#`, titles with `##`, and bullets 
with `-`.

### Content types

- You can add several types of content
    - code (use backticks)
    - images
    - diagrams
    - tables
    - etc.

Creating PowerPoint slides

To generate a presentation from a .qmd file, add format: pptx to the YAML front-matter.

Quarto will use the pandoc PowerPoint rules to render the content from the .qmd into .pptx.

The “pandoc rules” limit the flexibility to create PowerPoint presentations. Quarto has better presentation support for revealjs and beamer.

  • PowerPoint can use a template with only these layouts:
    • Title Slide
    • Title and Content
    • Section Header
    • Two Content
    • Comparison
    • Content with Caption
    • Blank

PowerPoint layout rules

The rules are available at:
https://pandoc.org/MANUAL.html#powerpoint-layout-choice

  • Title Slide: created from metadata fields like title and author
  • Section Header: created from the top-level headings (#)
  • Two Content: used when .md source contains :::: {.columns} and only text content
  • Comparison: same as “Two Content”, but content of columns is not text
  • Content with Caption: used when slide has non-text content, but no columns
  • Title and Content: whatever doesn’t fit the rules above.

PowerPoint templates

By adding a reference-doc entry to your YAML, you can tell Quarto (and pandoc) to use a file as a template for the format of your presentation.

The “Slide Master” needs to contain layouts named as per the previous slide (e.g. “Comparison”).

This allows you a lot of flexibility in the design of your slide deck, even if it is for just the small number of layouts that were listed in the previous slide.

You can control fonts, add background images, page numbering, etc.

---
title: "Using Quarto for everything"
format: pptx
reference-doc: templates/template.pptx
author: Lucas A. Meyer
date: 2022-07-14
---

Best feature: generate content dynamically

Let’s say you’re presenting a project about population dynamics but you don’t know which world leaders are coming to the conference.

On the presentation day, you learn that Belgium, China, Brazil, India, Japan and Nigeria are attending.

You can use Python or R to automatically generate slides.

Generating slides with Python

The next slides/sections were generated using the code below:

# Read data
df_dr = pd.read_csv("data/dr.csv.gz", compression="gzip")
df_pop = pd.read_csv("data/pop_brackets.csv.gz", compression="gzip")

# Configure parameters
years = [2000, 2025, 2050, 2075, 2100]
regions = ["Belgium", "China", "Brazil", "India", "Japan", "Nigeria"]

# graph functions were importe earlier with 

for name in regions:
    # generate slide header
    display(Markdown(f"## Age and Population Pyramids for {name}"))
    
    # generate columns
    display(Markdown(f'<div class="columns">'))
    display(Markdown(f'<div class="column">'))
    # column 1 contains a dependency ratio plot
    plot_dependency_ratio(df_dr[df_dr.Location == name])
    display(Markdown(f'</div>'))
    display(Markdown(f'<div class="column">'))
    # column 2 contains a population pyramid
    plot_population_pyramid_series(df_pop[df_pop["Location"]==name], years)
    display(Markdown(f'</div>'))
    display(Markdown(f'</div>'))

Age and Population Pyramids for Belgium

Age and Population Pyramids for China

Age and Population Pyramids for Brazil

Age and Population Pyramids for India

Age and Population Pyramids for Japan

Age and Population Pyramids for Nigeria

Generating a website

What I could get by just changing the format in YAML

---
title: "Quarto with Python"
format: html
    # revealjs:
    #     incremental: false
    #     theme: [simple, revealjs-customizations.scss]
    #     title-slide-attributes:
    #         data-background-image: images/data-viz-bg.jpg
    #         data-background-size: contain
    #         data-background-position: right

author: Lucas A. Meyer
date: 2022-07-14
---


Adding or changing the format to html will create a website.

Screenshot of website

Scholarly articles

Writing a scholarly article

I reused some of the content of this presentation to create two scholarly-looking articles. The purpose of the articles is just to show how easy it is to generate them with Quarto, they don’t contain original research.

The relevant files are:

Scholarly article screenshots

Citations and Footnotes

Citations don’t work on presentations, but are easy to add to documents.

You need a BibTex file, e.g., bibliography.bib, and a reference to it bibliography: references.bib to the YAML front-matter. Quarto supports any Citation Style Language.

You can cite by using [@citation-name] in your text. Please check the article .qmd source and the PDF and DOCX outputs.

Generating footnotes is also easy. Using [^ref] links to a footnote, and [^ref: content of the footnote] generates its content1.

Books in Quarto

Books in Quarto

You can also write books with quarto. From the same collection of .qmd files, Quarto can generate:

  • ePub
  • PDF
  • Online book

Two recent examples are:

Hands-on Programming in R

Python for Data Analysis, 3E

Should I use Quarto?

Where I think Quarto is good (July 2022)

Articles: maybe

I think Quarto is more helpful for a team that already uses Git with Python notebooks or LaTeX to write articles. Microsoft Word collaboration through SharePoint and Teams is easier than Git and Quarto… but it’s not reproducible.

Python notebook: excellent

Quarto adds features to Python notebooks without detracting anything. All you need are a few YAML lines.

Blog: excellent

Quarto allows me to have a scriptable, Python-based blog. I can automate my blog to tweet and post to LinkedIn when I write new articles.

Presentations: special cases

For PowerPoint, render process = long edit cycle. Still useful when you have:

  • a lot of dynamic content
  • Reproducibility needs
  • Collaborators used to Git/Beamer

Quarto works great for RevealJS.

Next steps

{{< include slides/_next-steps.qmd >>}}

THANK YOU